-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Use fsck to detect corrupt file systems in reproducers #5518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Architecturally this looks good to me. cc @a-nogikh We need to figure out what to do with error handling. Nobody is reading manager logs. It can log on error level, then these surface in out logs alerting, but if it happens repeatedly, it may be a problem. Is there any rate-limiting in our logs alerting? Also now if I see "mount_in_repro", I don't really know if it means non-corrupted, or fsck failed. Do we want "mount_in_repro_non_corrupted"? Maybe we could also append any errors to the log file. If there is a failure with fsck, we will always want to know more (why /how it failed). |
|
Regarding how we could store this information in the DB and how we could check for corrupted images in the reporting rules (the question that popped up in offline discussions). I'd vote for not splitting the syzkaller/dashboard/app/entities_datastore.go Lines 58 to 62 in 571351c
If we want to store fsck output logs, that could also go there as a text reference like in other places. syzkaller/dashboard/app/entities_datastore.go Line 355 in 571351c
I think that should be quite straightforward implementation-wise. Checking for corrupted images in the Reporting filter is unfortunately much more tricky. syzkaller/dashboard/app/config.go Lines 274 to 275 in 571351c
syzkaller/dashboard/app/config.go Line 369 in 571351c
First one important question. There can be multiple crashes per bug, some of which may include corrupted images, some may not. Sometimes there are also some unrelated What kinds of report filtering do we want to support w.r.t. fsck results?
The answer to this question may affect the design considerations. If we go with 3, we might also want to recalculate the crash priority depending on whether the associated reproducer mounts any corrupted fs images. See this method: syzkaller/dashboard/app/api.go Line 910 in 571351c
It looks like we might have to transform the In the DB, we always reference the exact crash used in the specific reporting stage: syzkaller/dashboard/app/entities_datastore.go Line 323 in 571351c
So whenever we have to look at the previous stages, e.g. in the case below, we could pre-fetch all referenced crashes: syzkaller/dashboard/app/reporting.go Line 456 in 571351c
There's a helper that makes it simpler: syzkaller/dashboard/app/entities_datastore.go Lines 1122 to 1123 in 571351c
In the case when we decide whether to report or not and don't yet have syzkaller/dashboard/app/reporting.go Lines 78 to 84 in 571351c
|
b78995a to
93e9e62
Compare
As part of google#5518, I'm adding fsck logs as annotation to the mounted file system assets. For this, I need a variety of fsck-like commands in the ci environment as well as eventually in the production environment.
As part of google#5518, I'm adding fsck logs as annotation to the mounted file system assets. For this, I need a variety of fsck-like commands in the ci environment as well as eventually in the production environment.
|
Could you have another look ? I think we are now in a much better state in terms of logic to run fsck (I extended the syscall descriptions of file system mounts and expanded my tests range to 16 different file systems). But I'm still a bit unsure about the database storage story (I used a uint64 log entity but also piggy-backed on the assets list to render in the UI so it's a bit ugly that I had to introduce an asset type for something that is not actually uploaded like the other assets). Also I haven't explored much of the testing story yet since it's all over the place, let me know if you have any tests in mind you'd like to see. The tests Should (TM) pass once #5549 is submitted and the env docker image published. |
As part of google#5518, I'm adding fsck logs as annotation to the mounted file system assets. For this, I need a variety of fsck-like commands in the ci environment as well as eventually in the production environment.
9120da8 to
feda84b
Compare
As part of #5518, I'm adding fsck logs as annotation to the mounted file system assets. For this, I need a variety of fsck-like commands in the ci environment as well as eventually in the production environment.
a-nogikh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I've commented on the logic that is to be tested and what still needs to be adjusted.
d2a9001 to
6d08674
Compare
Syscall attributes are extended with a fsck command field which lets file system mount definitions specify a fsck-like command to run. This is required because all file systems have a custom fsck command invokation style. When uploading a compressed image asset to the dashboard, syz-manager also runs the fsck command and logs its output over the dashapi. The dashboard logs these fsck logs into the database. This has been requested by fs maintainer Ted Tso who would like to quickly understand whether a filesystem is corrupted or not before looking at a reproducer in more details. Ultimately, this could be used as an early triage sign to determine whether a bug is obviously critical.
This is a draft at the moment because I have a few unresolved questions such as: